Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems.

نویسندگان

  • Wouter Kool
  • Samuel J Gershman
  • Fiery A Cushman
چکیده

Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modular Reinforcement Learning Framework for Interactive Narrative Planning

A key functionality provided by interactive narrative systems is narrative adaptation: tailoring story experiences in response to users’ actions and needs. We present a datadriven framework for dynamically tailoring events in interactive narratives using modular reinforcement learning. The framework involves decomposing an interactive narrative into multiple concurrent sub-problems, formalized ...

متن کامل

Switch Packet Arbitration via Queue-Learning

In packet switches, packets queue at switch inputs and contend for outputs. The contention arbitration policy directly affects switch performance. The best policy depends on the current state of the switch and current traffic patterns. This problem is hard because the state space, possible transitions, and set of actions all grow exponentially with the size of the switch. We present a reinforce...

متن کامل

Formalizing Assistive Teleoperation

In assistive teleoperation, the robot helps the user accomplish the desired task, making teleoperation easier and more seamless. Rather than simply executing the user’s input, which is hindered by the inadequacies of the interface, the robot attempts to predict the user’s intent, and assists in accomplishing it. In this work, we are interested in the scientific underpinnings of assistance: we f...

متن کامل

Cooperation in Stochastic Games

The aim of this study is to explore the phenomenon of cooperative learning in multiple agent stochastic game of Keepout. We intend to investigate whether for a given number of reinforcement learning agents, can cooperative agents outperform independent agents who do not communicate during learning. In that regard, we would like to work towards quantifying the benefits of cooperation in differen...

متن کامل

A Novel Model for Arbitration between Planning and Habitual Control Systems

It is well established that humans decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent’s environment, while habitual action selection systems learn to automate by repeating previously rewarded actions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Psychological science

دوره 28 9  شماره 

صفحات  -

تاریخ انتشار 2017